Models for Socio-Environmental Data
JAGS Primer Lab
June 06, 2019

Motivation for using KnitR

You can complete all the coding exercises in the JAGS Primer using simple R scripts. However, you might be wondering about how we created the JAGS Primer key you are looking at right now. We did this using Yihui Xie’s knitr package in R. This can be a highly useful tools for organizing your Bayesian analyses. Within the same document, you can:

Best of all, knitr can produce beautiful html files as output, which can be easily shared with collaborators. We encourage you to become familiar with knitr. We recommend Karl Broman’s knitr in a nutshell as an excellent introductory tutorial.



Exercise 1: Factoring

There is no x in the posterior distribution in equation 4. What are assuming if x is absent? Draw the Bayesian network, or DAG, for this model. Use the chain rule to fully factor the joint distribution into sensible parts then simplify by assuming that \(r\), \(K\) and \(\tau\) are independent.



Exercise 2: Can you improve these priors?

A recurring theme in this course will be to use priors that are informative whenever possible. The gamma priors in equation 4 of the JAGS Primer include the entire number line \(>0\). Don’t we know more about population biology than that? Lets, say for now that we are modeling the population dynamics of a large mammal. How might you go about making the priors on population parameters more informative?



Exercise 3: Using for loops

Write a code fragment to set vague normal priors for 5 regression coefficients – dnorm(0, 10E-6) – stored in the vector b.



Exercise 4: Coding the logistic regression

Write R code (algorithm 3) to run the JAGS model (algorithm 3) and estimate the parameters, \(r\), \(K\) \(\sigma\), and \(\tau\). We suggest you insert the JAGS model into this R script using the sink() command as shown in algorithm 4. because this model is small. You will find this a convenient way to keep all your code in the same R script. For larger models, you will be happier using a separate file for the JAGS code. Here is the joint distribution for our logistic model again, with the priors updated from exercise 2 and \(\tau\) expressed as a derived quantity:

\[\begin{eqnarray} \mu_{i} & = & r-\frac{rx_{i}}{K}\textrm{,}\nonumber\\[1em] \tau & = & \frac{1}{\sigma^{2}}\textrm{,}\nonumber\\[1em] \left[r,K,\sigma\mid\mathbf{y}\right] & \propto & \prod_{i=1}^{n}\textrm{normal}\left(y_{i} \mid \mu_{i},\tau\right)\textrm{uniform}\left(K\mid 0,4000\right) \textrm{uniform}\left(\sigma\mid 0, 5\right) \textrm{uniform}\left(r\mid 0,2\right)\textrm{.}\nonumber\\ \end{eqnarray}\]



Exercise 5: Understanding coda ojects

  1. Look at the first six rows of the data frame.
  2. Find the maximum value of \(\sigma\).
  3. Find the mean of r for the first 1000 iterations.
  4. Find the mean of \(r\) after the first 1000 iterations.
  5. Make two publication quality plots of the marginal posterior density of K, one as a smooth curve and the other as a histogram.
  6. Compute the probability that K < 1600. Hint: what type of probability distribution would you use for this computation? Investigate the the dramatically useful R function ecdf().
  7. Compute the probability that 1000 < K < 1300.
  8. Compute the .025 and .975 quantiles of K. Hint–use the R quantile() function.



Exercise 6: Summarizing coda objects

Summarize the coda output from the logistic model with 4 significant digits. Include Rhat and effective sample size diagnostics (more about these soon). Summarize the coda output for \(r\) alone.



Exercise 7: Using MCMCchains

Extract the chains for r and sigma as a data frame using MCMCchains and compute their .025 and .975 quantiles from the extracted object. Display three significant digits. Make a publication quality histogram for the chain for sigma. Indicate the .025 and .975 Bayesian equal-tailed credible interval value with vertical red lines. Overlay the .95 highest posterior density interval with vertical lines in blue. This is a “learn on your own” problem intended to allow you to rehearse the most important goal of this class: being sufficiently confident to figure things out. Load the package HDinterval, enter HDInterval at the console and follow your nose. Also see Hobbs and Hooten Figure 8.2.1.



Exercise 8: Plotting data and model predictions using the MCMCpstr formatting

For the logistic model:

  1. Plot the observations of growth rate as a function of observed population size.
  2. Overlay the median of the model predictions as a solid line.
  3. Overlay the 95% equal-tailed credible intervals as dashed lines in red.
  4. Overlay the 95% highest posterior density intervals as dashed lines in blue.
  5. What do you note about these two intervals? Will this always be the case? Why or why not?
  6. What do the dashed lines represent? What inferential statement can we make about relative these lines?



Exercise 9: Creating caterpillar plots with MCMCplot

Use the MCMCplot function to create caterpillar plots to visualize the posterior densities for \(r\), and \(\sigma\) using the coda object zm from earlier. Then use MCMCplot() to explore different plotting options. There are lots of these options, including ways to make the plots publication quality, show overlap with zero, etc.



Exercise 10: Assessing convergence

Rerun the logistic model with n.adapt = 100. Then do the following:

  1. Keep the next 500 iterations. Assess convergence visually with MCMCtrace() and with the Gelman-Rubin, Heidelberger and Welch, and Raftery diagnostics.
  2. Update another 500 iterations and then keep 500 more iterations. Repeat your assessment of convergence.
  3. Repeat steps 1 and 2 until you feel you have reached convergence.
  4. Change the adapt phase to zero and repeat steps 1 – 4. What happens?



ADVANCED Exercise 11: Coding the logistic regression to run in parallel

Append R code (algorithm 5) to the script you made in exercise 4 to run the JAGS model (algorithm 2) in parallel and estimate the parameters, \(r\), \(K\) \(\sigma\), and \(\tau\). Use the proc.time() function in R to compare the time required for the sequential and parallel JAGS run. If your computer has 3 threads, try running only 2 chains in parallel when doing this exercise. If you have fewer than 3 threads, work with a classmate that has at least 3 threads.


You might be very tempted, in the name of reproducible science, to set the seed inside each worker to the same value. What happens when you do this? Compare the results from this run to the previous run using MCMCsummary() and MCMC(trace).


Repeat this exercise with fixed initial values using algorithm 6.